Burr Distribution (burr, Burr Type III / Dagum)#

The Burr (Type III) distribution (called the Dagum distribution in economics) is a flexible family on positive real values with polynomial (heavy) tails.

It is a good default when:

  • your data are strictly positive (income, sizes, waiting times, lifetimes)

  • the right tail can be much heavier than Lognormal/Gamma

  • you want a model with interpretable tail behavior and an easy sampler


Learning goals#

By the end you should be able to:

  • write the PDF/CDF/quantile function and interpret the parameters

  • compute moments (and know when they do not exist)

  • derive expectation/variance and the likelihood

  • sample from burr using inverse-transform sampling (NumPy-only)

  • fit and use the distribution via scipy.stats.burr

Notation#

  • Shape parameters: \(c > 0\), \(d > 0\)

  • Random variable: \(X \sim \mathrm{BurrIII}(c, d)\)

  • Standard support: \(x > 0\)


Table of contents#

  1. Title & Classification

  2. Intuition & Motivation

  3. Formal Definition

  4. Moments & Properties

  5. Parameter Interpretation

  6. Derivations

  7. Sampling & Simulation

  8. Visualization

  9. SciPy Integration

  10. Statistical Use Cases

  11. Pitfalls

  12. Summary

import numpy as np
import plotly
import plotly.express as px
import plotly.graph_objects as go
import os
import plotly.io as pio

import scipy
from scipy import stats
from scipy.special import gammaln, psi, logsumexp

pio.templates.default = "plotly_white"
pio.renderers.default = os.environ.get("PLOTLY_RENDERER", "notebook")
np.set_printoptions(precision=6, suppress=True)

rng = np.random.default_rng(42)

print("numpy:", np.__version__)
print("scipy:", scipy.__version__)
print("plotly:", plotly.__version__)
numpy: 1.26.2
scipy: 1.15.0
plotly: 6.5.2

1) Title & Classification#

  • Name: burr (Burr Type III; also known as the Dagum distribution)

  • Type: continuous

  • Standard support: \(x > 0\) (SciPy defines the PDF for \(x \ge 0\); depending on parameters, the density can diverge as \(x \to 0^+\))

  • Parameter space (standard form): \(c > 0\), \(d > 0\)

  • Location/scale form (SciPy): \(X = \mathrm{loc} + \mathrm{scale}\cdot Y\) with \(Y \sim \mathrm{BurrIII}(c, d)\)

    • Support becomes \(x > \mathrm{loc}\)

    • \(\mathrm{scale} > 0\)

2) Intuition & Motivation#

What it models#

The Burr Type III/Dagum family is a positive, right-skewed, heavy-tailed model. It can represent distributions where:

  • very small values are possible (depending on \(c d\))

  • extremely large values occasionally occur

  • the right tail decays like a power law

Typical real-world use cases#

  • Income/wealth modeling (Dagum distribution is common in economics)

  • Insurance claim sizes and other positive heavy-tailed costs

  • Reliability / lifetime modeling when failures can occur very late

  • Hydrology and environmental quantities with occasional extremes

Relations to other distributions#

  • Log-logistic (Fisk): when \(d = 1\), $\(F(x) = \frac{1}{1 + x^{-c}}\)\( and \)\log X\( is logistic with scale \)1/c$.

  • Reciprocal relationship to Burr XII: if \(X \sim \mathrm{BurrIII}(c,d)\), then \(1/X \sim \mathrm{BurrXII}(c,d)\). SciPy provides both scipy.stats.burr (Type III/Dagum) and scipy.stats.burr12 (Type XII).

  • Key transformation: \(T = X^{-c}\) follows a Lomax (Pareto II) distribution with shape \(d\). This turns many calculations into Beta/Gamma-function identities.

3) Formal Definition#

CDF#

In the standard (unshifted, unit-scale) parameterization, for \(x>0\):

\[ F(x; c, d) = \left(1 + x^{-c}\right)^{-d}, \qquad c>0,\ d>0. \]

PDF#

Differentiating the CDF gives the density:

\[ f(x; c, d) = \frac{\partial}{\partial x} \left(1 + x^{-c}\right)^{-d} = c d\, \frac{x^{-c-1}}{\left(1 + x^{-c}\right)^{d+1}}, \qquad x>0. \]

Quantile function (inverse CDF)#

Let \(p \in (0,1)\). Solving \(p = (1 + x^{-c})^{-d}\) gives:

\[ Q(p) = \left(p^{-1/d} - 1\right)^{-1/c}. \]

Location/scale#

SciPy uses the standard location/scale convention:

\[ X = \mathrm{loc} + \mathrm{scale}\cdot Y, \qquad Y \sim \mathrm{BurrIII}(c,d),\ \mathrm{scale}>0. \]
def burr_logpdf(x, c, d):
    """Log-PDF of the standard Burr Type III / Dagum distribution (loc=0, scale=1)."""
    x = np.asarray(x, dtype=float)
    out = np.full_like(x, -np.inf, dtype=float)
    mask = x > 0
    xm = x[mask]
    logx = np.log(xm)
    # log(1 + x^{-c}) = log(1 + exp(-c log x)) computed stably
    log1p_xnegc = np.logaddexp(0.0, -c * logx)
    out[mask] = (
        np.log(c)
        + np.log(d)
        + (-c - 1.0) * logx
        - (d + 1.0) * log1p_xnegc
    )
    return out


def burr_pdf(x, c, d):
    return np.exp(burr_logpdf(x, c, d))


def burr_logcdf(x, c, d):
    """Log-CDF of the standard Burr Type III / Dagum distribution."""
    x = np.asarray(x, dtype=float)
    out = np.full_like(x, -np.inf, dtype=float)
    mask = x > 0
    xm = x[mask]
    logx = np.log(xm)
    log1p_xnegc = np.logaddexp(0.0, -c * logx)
    out[mask] = -d * log1p_xnegc
    return out


def burr_cdf(x, c, d):
    x = np.asarray(x, dtype=float)
    out = np.zeros_like(x, dtype=float)
    mask = x > 0
    out[mask] = np.exp(burr_logcdf(x[mask], c, d))
    return out


def burr_ppf(p, c, d):
    """Quantile function Q(p) for p in [0,1]."""
    p = np.asarray(p, dtype=float)
    x = np.full_like(p, np.nan, dtype=float)
    x[p == 0] = 0.0
    x[p == 1] = np.inf
    mask = (p > 0) & (p < 1)
    # p^{-1/d} - 1 = exp(-log p / d) - 1; expm1 is stable when p ~ 1.
    t = np.expm1(-np.log(p[mask]) / d)
    x[mask] = np.power(t, -1.0 / c)
    return x


def burr_rvs_numpy(c, d, size, rng=None):
    """NumPy-only sampler via inverse-transform sampling."""
    rng = np.random.default_rng() if rng is None else rng
    u = rng.random(size)
    return burr_ppf(u, c, d)


def burr_raw_moment(k, c, d):
    """Raw moment E[X^k] for k < c; returns +inf when the moment diverges."""
    if k >= c:
        return np.inf
    return np.exp(gammaln(1.0 - k / c) + gammaln(d + k / c) - gammaln(d))


def burr_entropy(c, d):
    """Differential entropy of the standard Burr Type III / Dagum distribution."""
    return -np.log(c * d) - (1.0 + 1.0 / c) * (psi(1.0) - psi(d)) + 1.0 + 1.0 / d


def burr_summary_stats(c, d):
    """Mean/variance/skewness/excess kurtosis (when finite), else nan/inf."""
    mean = burr_raw_moment(1.0, c, d) if c > 1 else np.inf
    if c <= 2:
        return mean, np.inf, np.nan, np.nan

    m2 = burr_raw_moment(2.0, c, d)
    var = m2 - mean**2

    skew = np.nan
    exkurt = np.nan

    if c > 3:
        m3 = burr_raw_moment(3.0, c, d)
        mu3 = m3 - 3 * m2 * mean + 2 * mean**3
        skew = mu3 / (var ** 1.5)

    if c > 4:
        m3 = burr_raw_moment(3.0, c, d)  # defined since c>4
        m4 = burr_raw_moment(4.0, c, d)
        mu4 = m4 - 4 * m3 * mean + 6 * m2 * mean**2 - 3 * mean**4
        exkurt = mu4 / (var**2) - 3.0

    return mean, var, skew, exkurt

4) Moments & Properties#

Existence of moments (key takeaway)#

The right tail behaves like a power law:

\[ \Pr(X > x) = 1 - F(x) \sim d\,x^{-c}\quad \text{as } x\to\infty. \]

So the \(k\)-th moment exists iff \(k < c\) (independent of \(d\)).

Raw moments#

For \(k < c\):

\[ \mathbb{E}[X^k] = \frac{\Gamma\left(1 - \tfrac{k}{c}\right)\,\Gamma\left(d + \tfrac{k}{c}\right)}{\Gamma(d)}. \]

Mean and variance#

  • Mean (exists for \(c>1\)): $\(\mathbb{E}[X] = \frac{\Gamma\left(1 - \tfrac{1}{c}\right)\,\Gamma\left(d + \tfrac{1}{c}\right)}{\Gamma(d)}\)$

  • Second moment exists for \(c>2\): $\(\mathbb{E}[X^2] = \frac{\Gamma\left(1 - \tfrac{2}{c}\right)\,\Gamma\left(d + \tfrac{2}{c}\right)}{\Gamma(d)}\)$

  • Variance (exists for \(c>2\)): $\(\mathrm{Var}(X) = \mathbb{E}[X^2] - \mathbb{E}[X]^2\)$

Skewness and kurtosis#

  • Skewness exists for \(c>3\)

  • Excess kurtosis exists for \(c>4\)

You can compute them from raw moments \(m_k = \mathbb{E}[X^k]\) via standard formulas.

MGF / characteristic function#

  • The MGF \(M(t)=\mathbb{E}[e^{tX}]\) diverges for any \(t>0\) because the tail is polynomial.

  • The characteristic function \(\varphi(t)=\mathbb{E}[e^{itX}]\) exists for all real \(t\), but does not have a simple elementary closed form (it can be expressed using special functions and evaluated numerically).

Entropy#

The (differential) entropy has a closed form:

\[ h(X) = -\log(cd) - \left(1 + \frac{1}{c}\right)\left(\psi(1) - \psi(d)\right) + 1 + \frac{1}{d}, \]

where \(\psi\) is the digamma function.

c0, d0 = 3.5, 2.0
x_test = np.array([0.2, 0.5, 1.0, 2.0, 5.0])

pdf_np = burr_pdf(x_test, c0, d0)
pdf_sp = stats.burr.pdf(x_test, c0, d0)
print("max |pdf_numpy - pdf_scipy|:", np.max(np.abs(pdf_np - pdf_sp)))

mean, var, skew, exkurt = burr_summary_stats(c0, d0)
mean_sp, var_sp, skew_sp, exkurt_sp = stats.burr.stats(c0, d0, moments="mvsk")
print("mean:", mean, "(scipy:", float(mean_sp), ")")
print("var:", var, "(scipy:", float(var_sp), ")")
print("skew:", skew, "(scipy:", float(skew_sp), ")")
print("excess kurtosis:", exkurt, "(scipy:", float(exkurt_sp), ")")

h = burr_entropy(c0, d0)
print("entropy:", h, "(scipy:", float(stats.burr.entropy(c0, d0)), ")")

print("\nMoment existence demo (moment finite iff k < c):")
for k in [0.5, 1.0, 2.0, 3.0, 4.0]:
    m = burr_raw_moment(k, c0, d0)
    print(f"E[X^{k}] = {m}")
max |pdf_numpy - pdf_scipy|: 3.3306690738754696e-16
mean: 1.4760910375888234 (scipy: 1.4760910375888239 )
var: 0.7147250598130057 (scipy: 0.7147250598130044 )
skew: 8.514406493629222 (scipy: 8.514406493629242 )
excess kurtosis: nan (scipy: nan )
entropy: 0.8398041366589724 (scipy: 0.8398041366582019 )

Moment existence demo (moment finite iff k < c):
E[X^0.5] = 1.1821440631620528
E[X^1.0] = 1.4760910375888234
E[X^2.0] = 2.893569811063055
E[X^3.0] = 11.525904615830015
E[X^4.0] = inf

5) Parameter Interpretation#

The parameters \(c\) and \(d\) both affect shape, but in different ways.

\(c\) (tail index)#

  • Controls the right-tail exponent: \(\Pr(X>x) \sim d\,x^{-c}\).

  • Larger \(c\) means a lighter tail and more finite moments.

    • mean exists for \(c>1\)

    • variance exists for \(c>2\)

\(d\) (body / lower-tail behavior)#

  • Controls the body and behavior near 0.

  • As \(x\to 0^+\), \(F(x) \approx x^{c d}\) and \(f(x) \approx c d\,x^{c d - 1}\).

    • If \(c d \ge 1\), the PDF is finite at 0.

    • If \(c d < 1\), the PDF diverges at 0.

A useful mental model comes from the transformation \(T = X^{-c}\):

  • \(T\) follows a Lomax distribution with shape \(d\).

  • Then \(X = T^{-1/c}\) stretches/compresses that Lomax behavior on a log scale.

x = np.logspace(-3, 3, 800)

# Effect of changing c (tail index)
d_fixed = 2.0
c_list = [1.2, 2.0, 5.0]

fig_pdf_c = go.Figure()
for c in c_list:
    y = np.maximum(burr_pdf(x, c, d_fixed), 1e-300)
    fig_pdf_c.add_trace(
        go.Scatter(x=x, y=y, mode="lines", name=f"c={c}, d={d_fixed}")
    )

fig_pdf_c.update_layout(title="PDF shape when varying c (d fixed)")
fig_pdf_c.update_xaxes(type="log", title="x")
fig_pdf_c.update_yaxes(type="log", title="pdf(x)")
fig_pdf_c.show()

# Effect of changing d (body / lower-tail behavior)
c_fixed = 3.0
d_list = [0.5, 1.0, 3.0]

fig_pdf_d = go.Figure()
for d in d_list:
    y = np.maximum(burr_pdf(x, c_fixed, d), 1e-300)
    fig_pdf_d.add_trace(
        go.Scatter(x=x, y=y, mode="lines", name=f"c={c_fixed}, d={d}")
    )

fig_pdf_d.update_layout(title="PDF shape when varying d (c fixed)")
fig_pdf_d.update_xaxes(type="log", title="x")
fig_pdf_d.update_yaxes(type="log", title="pdf(x)")
fig_pdf_d.show()

# CDF view (often easier to interpret)
fig_cdf = go.Figure()
for c in c_list:
    fig_cdf.add_trace(
        go.Scatter(x=x, y=burr_cdf(x, c, d_fixed), mode="lines", name=f"c={c}, d={d_fixed}")
    )
fig_cdf.update_layout(title="CDF when varying c (d fixed)")
fig_cdf.update_xaxes(type="log", title="x")
fig_cdf.update_yaxes(title="cdf(x)")
fig_cdf.show()

6) Derivations#

A clean way to derive many results is to transform \(X\) into a Lomax random variable.

Step 1: show \(T = X^{-c}\) is Lomax#

Let \(T = X^{-c}\). For \(t>0\):

\[ \Pr(T \le t) = \Pr(X^{-c} \le t) = \Pr\left(X \ge t^{-1/c}\right) = 1 - F\left(t^{-1/c}\right). \]

But

\[ F\left(t^{-1/c}\right) = \left(1 + (t^{-1/c})^{-c}\right)^{-d} = (1 + t)^{-d}. \]

So

\[ \Pr(T \le t) = 1 - (1+t)^{-d}, \]

which is exactly the Lomax/Pareto-II CDF with shape \(d\) and unit scale.

Step 2: derive \(\mathbb{E}[X^k]\)#

Since \(X = T^{-1/c}\),

\[ \mathbb{E}[X^k] = \mathbb{E}[T^{-k/c}]. \]

With Lomax density \(g(t) = d (1+t)^{-d-1}\) for \(t>0\),

\[ \mathbb{E}[X^k] = d\int_0^\infty t^{-k/c}(1+t)^{-d-1}\,dt. \]

This is a Beta-function integral (valid when \(-1 < -k/c\), i.e. \(k<c\)):

\[ \int_0^\infty t^{a}(1+t)^{-d-1}\,dt = \mathrm{B}(a+1, d-a) \]

with \(a = -k/c\). Substituting and simplifying yields

\[ \mathbb{E}[X^k] = \frac{\Gamma\left(1 - \tfrac{k}{c}\right)\,\Gamma\left(d + \tfrac{k}{c}\right)}{\Gamma(d)},\qquad k<c. \]

Variance#

Plug \(k=1\) and \(k=2\) into the raw-moment formula and use

\[\mathrm{Var}(X) = \mathbb{E}[X^2] - \mathbb{E}[X]^2\]

(finite only when \(c>2\)).

Likelihood#

Given i.i.d. data \(x_1,\dots,x_n\) (all \(>0\)) in the standard form, the log-likelihood is:

\[ \ell(c,d) = \sum_{i=1}^n \log f(x_i;c,d) = n(\log c + \log d) - (c+1)\sum_{i=1}^n \log x_i - (d+1)\sum_{i=1}^n \log(1 + x_i^{-c}). \]

There is no closed-form MLE for \((c,d)\) in general; numerical optimization is used in practice (see scipy.stats.burr.fit).

def burr_loglik(c, d, x):
    x = np.asarray(x, dtype=float)
    if (c <= 0) or (d <= 0) or np.any(x <= 0):
        return -np.inf
    return float(np.sum(burr_logpdf(x, c, d)))


# quick sanity check: log-likelihood is higher near the true parameters (on average)
c_true, d_true = 3.0, 2.0
x_data = burr_rvs_numpy(c_true, d_true, size=2000, rng=rng)

for (c_try, d_try) in [(2.0, 2.0), (3.0, 2.0), (4.0, 2.0), (3.0, 1.0), (3.0, 3.0)]:
    print((c_try, d_try), burr_loglik(c_try, d_try, x_data))
(2.0, 2.0) -2336.3535391062123
(3.0, 2.0) -2114.319013644667
(4.0, 2.0) -2273.9298182857347
(3.0, 1.0) -2514.0593433068734
(3.0, 3.0) -2289.942828886021

7) Sampling & Simulation (NumPy-only)#

The standard Burr III CDF is

\[F(x) = (1 + x^{-c})^{-d}.\]

Using inverse-transform sampling:

  1. Draw \(U \sim \mathrm{Uniform}(0,1)\)

  2. Set \(U = (1 + X^{-c})^{-d}\)

  3. Solve: $\(U^{-1/d} = 1 + X^{-c} \Rightarrow X = (U^{-1/d} - 1)^{-1/c}\)$

This gives a fast sampler without any rejection steps.

c_samp, d_samp = 3.0, 2.0
samples = burr_rvs_numpy(c_samp, d_samp, size=50_000, rng=rng)

qs = np.array([0.1, 0.5, 0.9, 0.99])
q_emp = np.quantile(samples, qs)
q_theory = burr_ppf(qs, c_samp, d_samp)

print("Quantiles p:", qs)
print("Empirical:", q_emp)
print("Theory:", q_theory)

print("\nSample mean/var (finite here since c=3>2):")
print("mean:", samples.mean(), "(theory:", burr_raw_moment(1, c_samp, d_samp), ")")
print("var:", samples.var(), "(theory:", burr_raw_moment(2, c_samp, d_samp) - burr_raw_moment(1, c_samp, d_samp) ** 2, ")")
Quantiles p: [0.1  0.5  0.9  0.99]
Empirical: [0.775767 1.346368 2.6389   5.900616]
Theory: [0.773326 1.341504 2.644159 5.833366]

Sample mean/var (finite here since c=3>2):
mean: 1.6135746770341746 (theory: 1.612266101541527 )
var: 1.4486312796347882 (theory: 1.4312632716739029 )

8) Visualization#

We’ll visualize:

  • the theoretical PDF and CDF

  • a Monte Carlo sample (histogram + empirical CDF)

c_vis, d_vis = 3.0, 2.0
x_grid = np.logspace(-3, 3, 800)

# PDF
fig_pdf = go.Figure()
fig_pdf.add_trace(
    go.Scatter(x=x_grid, y=np.maximum(burr_pdf(x_grid, c_vis, d_vis), 1e-300), mode="lines", name="pdf")
)
fig_pdf.update_layout(title=f"Burr III PDF (c={c_vis}, d={d_vis})")
fig_pdf.update_xaxes(type="log", title="x")
fig_pdf.update_yaxes(type="log", title="pdf(x)")
fig_pdf.show()

# CDF
fig_cdf2 = go.Figure()
fig_cdf2.add_trace(go.Scatter(x=x_grid, y=burr_cdf(x_grid, c_vis, d_vis), mode="lines", name="cdf"))
fig_cdf2.update_layout(title=f"Burr III CDF (c={c_vis}, d={d_vis})")
fig_cdf2.update_xaxes(type="log", title="x")
fig_cdf2.update_yaxes(title="cdf(x)")
fig_cdf2.show()

# Monte Carlo samples: histogram + PDF overlay
samples_vis = burr_rvs_numpy(c_vis, d_vis, size=30_000, rng=rng)
fig_hist = px.histogram(
    samples_vis,
    nbins=80,
    histnorm="probability density",
    log_x=True,
    opacity=0.55,
    title=f"Monte Carlo histogram vs PDF (c={c_vis}, d={d_vis})",
)
fig_hist.add_trace(
    go.Scatter(x=x_grid, y=burr_pdf(x_grid, c_vis, d_vis), mode="lines", name="pdf")
)
fig_hist.update_xaxes(title="x")
fig_hist.update_yaxes(title="density")
fig_hist.show()

# Empirical CDF vs theoretical CDF
x_sorted = np.sort(samples_vis)
ecdf = np.arange(1, len(x_sorted) + 1) / len(x_sorted)

fig_ecdf = go.Figure()
fig_ecdf.add_trace(go.Scatter(x=x_sorted, y=ecdf, mode="lines", name="empirical CDF"))
fig_ecdf.add_trace(go.Scatter(x=x_grid, y=burr_cdf(x_grid, c_vis, d_vis), mode="lines", name="theoretical CDF"))
fig_ecdf.update_layout(title="Empirical vs theoretical CDF")
fig_ecdf.update_xaxes(type="log", title="x")
fig_ecdf.update_yaxes(title="CDF")
fig_ecdf.show()

9) SciPy Integration (scipy.stats.burr)#

SciPy provides a ready-to-use implementation:

  • stats.burr.pdf(x, c, d, loc=0, scale=1)

  • stats.burr.cdf(x, c, d, loc=0, scale=1)

  • stats.burr.rvs(c, d, loc=0, scale=1, size=..., random_state=...)

  • stats.burr.fit(data, ...) (MLE)

Reminder: SciPy’s burr is Burr Type III / Dagum. SciPy’s burr12 is Burr Type XII.

c_true, d_true = 3.0, 2.0

dist = stats.burr(c_true, d_true)  # loc=0, scale=1 by default
x_eval = np.array([0.5, 1.0, 2.0, 5.0])
print("pdf:", dist.pdf(x_eval))
print("cdf:", dist.cdf(x_eval))

# rvs
data = dist.rvs(size=3000, random_state=rng)
print("sample min/max:", data.min(), data.max())

# fit (fix loc=0, scale=1 to estimate only c and d)
c_hat, d_hat, loc_hat, scale_hat = stats.burr.fit(data, floc=0, fscale=1)
print("\nTrue (c,d):", (c_true, d_true))
print("Fit  (c,d):", (c_hat, d_hat))
print("Returned loc/scale:", (loc_hat, scale_hat))

# Compare numpy vs SciPy implementations numerically
x_dense = np.logspace(-3, 3, 1000)
max_pdf_diff = np.max(np.abs(burr_pdf(x_dense, c_true, d_true) - dist.pdf(x_dense)))
max_cdf_diff = np.max(np.abs(burr_cdf(x_dense, c_true, d_true) - dist.cdf(x_dense)))
print("\nmax |pdf_numpy - pdf_scipy|:", max_pdf_diff)
print("max |cdf_numpy - cdf_scipy|:", max_cdf_diff)
pdf: [0.131687 0.75     0.263374 0.009373]
cdf: [0.012346 0.25     0.790123 0.98419 ]
sample min/max: 0.21869142304750552 14.36061158934588

True (c,d): (3.0, 2.0)
Fit  (c,d): (3.0007293936222674, 1.9803751910399106)
Returned loc/scale: (0, 1)

max |pdf_numpy - pdf_scipy|: 5.551115123125783e-16
max |cdf_numpy - cdf_scipy|: 2.220446049250313e-16

10) Statistical Use Cases#

Hypothesis testing#

  • Nested model test: the case \(d=1\) is log-logistic. You can test \(H_0: d=1\) vs \(H_1: d \ne 1\) using a likelihood-ratio test (LRT).

  • Goodness-of-fit: QQ-plots or distribution tests (KS/AD) can be used as diagnostics. Be careful: classical p-values assume parameters are known, while in practice they’re often estimated.

Bayesian modeling#

When modeling positive heavy-tailed data, you can use Burr III as a likelihood and place priors on \(c\) and \(d\) (e.g., log-normal or log-uniform priors). There is no conjugate prior, but posterior inference is straightforward with MCMC or even a simple grid approximation in low dimensions.

Generative modeling#

  • Useful as a base distribution for positive heavy-tailed generative models.

  • Can be used in mixtures (mixture of Burrs) to model multimodal positive data.

  • Reciprocal relationship to Burr XII can be exploited depending on whether your domain is naturally modeled by \(X\) or \(1/X\).

# Likelihood-ratio test example: H0: d=1 (log-logistic) vs H1: d free

c0, d0 = 2.5, 1.0
x = stats.burr.rvs(c0, d0, size=1500, random_state=rng)

# Fit under H1 (free c, d) and H0 (d fixed to 1). Fix loc=0, scale=1 for simplicity.
c_hat1, d_hat1, _, _ = stats.burr.fit(x, floc=0, fscale=1)
c_hat0, d_hat0, _, _ = stats.burr.fit(x, f1=1.0, floc=0, fscale=1)  # d fixed

ll1 = np.sum(stats.burr.logpdf(x, c_hat1, d_hat1))
ll0 = np.sum(stats.burr.logpdf(x, c_hat0, 1.0))

lrt_stat = 2 * (ll1 - ll0)
p_value = stats.chi2.sf(lrt_stat, df=1)

print("True params:", (c0, d0))
print("Fit H1 (c,d):", (c_hat1, d_hat1))
print("Fit H0 (c,d=1):", (c_hat0, 1.0))
print("LRT stat:", float(lrt_stat))
print("Approx p-value (chi^2_1):", float(p_value))
True params: (2.5, 1.0)
Fit H1 (c,d): (2.547213421880941, 1.025130683635059)
Fit H0 (c,d=1): (2.5686523437500033, 1.0)
LRT stat: 0.7672483125884355
Approx p-value (chi^2_1): 0.3810696572372613
# Simple Bayesian grid posterior over (c,d) with a log-uniform prior p(c,d) ∝ 1/(c d)
# This is an approximation for intuition (not a replacement for MCMC for serious work).

c_true, d_true = 3.0, 2.0
data = stats.burr.rvs(c_true, d_true, size=400, random_state=rng)
logx = np.log(data)
sum_logx = logx.sum()
n = data.size

c_grid = np.linspace(1.1, 6.0, 90)   # avoid c<=1 where mean diverges
d_grid = np.linspace(0.2, 6.0, 90)

log_post = np.empty((c_grid.size, d_grid.size), dtype=float)
for i, c in enumerate(c_grid):
    # sum_i log(1 + x_i^{-c}) computed stably
    s = np.logaddexp(0.0, -c * logx).sum()

    # log-likelihood for each d (vectorized)
    loglike = n * np.log(c) + n * np.log(d_grid) + (-c - 1.0) * sum_logx - (d_grid + 1.0) * s

    # log-uniform prior on (c,d) over the grid bounds
    logprior = -np.log(c) - np.log(d_grid)
    log_post[i, :] = loglike + logprior

# Normalize on the discrete grid (treating cells as equal-area for visualization)
log_post -= logsumexp(log_post)
post = np.exp(log_post)

i_map, j_map = np.unravel_index(np.argmax(post), post.shape)
print("True (c,d):", (c_true, d_true))
print("MAP  (c,d):", (float(c_grid[i_map]), float(d_grid[j_map])))

fig_post = go.Figure(
    data=go.Contour(
        x=d_grid,
        y=c_grid,
        z=post,
        contours_coloring="heatmap",
        colorbar_title="posterior",
    )
)
fig_post.update_layout(title="Grid posterior p(c,d | data) with log-uniform prior")
fig_post.update_xaxes(title="d")
fig_post.update_yaxes(title="c")
fig_post.show()
True (c,d): (3.0, 2.0)
MAP  (c,d): (2.971910112359551, 1.8943820224719101)

11) Pitfalls#

  • Invalid parameters: require \(c>0\) and \(d>0\). With location/scale, also require \(\mathrm{scale}>0\).

  • Support mismatch: data must satisfy \(x>0\) in the standard form, and \(x>\mathrm{loc}\) with a location shift.

  • Non-existent moments: the \(k\)-th moment exists iff \(k<c\).

    • If \(c\le 1\), the mean diverges.

    • If \(c\le 2\), the variance diverges.

  • Near-zero behavior: if \(c d < 1\), the PDF diverges at 0 (this can be fine mathematically, but it affects numerics and interpretation).

  • Numerical issues: direct computation of \(x^{-c}\) can overflow for tiny \(x\). Prefer log-domain computations (as in burr_logpdf).

  • Fitting:

    • MLE can be sensitive to starting points and to whether loc/scale are also being fit.

    • Use logpdf for likelihood work to avoid underflow.

  • Name confusion: Burr has multiple “types”. SciPy burr is Type III (Dagum), while SciPy burr12 is Type XII.

12) Summary#

  • burr (Burr Type III / Dagum) is a flexible positive, heavy-tailed continuous distribution.

  • The CDF is \(F(x)=(1+x^{-c})^{-d}\) and the PDF is \(f(x)=cd\,x^{-c-1}/(1+x^{-c})^{d+1}\).

  • Tail heaviness is controlled by \(c\): moments exist iff \(k<c\).

  • The transform \(T=X^{-c}\) yields a Lomax distribution, making derivations and sampling straightforward.

  • Sampling is easy via inverse CDF: \(X=(U^{-1/d}-1)^{-1/c}\).

  • SciPy integration is direct via scipy.stats.burr (and burr12 for Burr Type XII).